1 Introduction

In the presence of new physics (NP) phenomena, sources of CP violation in b-hadron decays can arise in addition to those predicted by the Standard Model (SM) [1]. In the \( B_{s}^{0} \rightarrow J/\psi \phi \) decay, CP violation occurs due to interference between a direct decay and a decay with \(B_{s}^{0} \)\(\bar{B}^{0}_{s}\) mixing. The oscillation frequency of \(B_{s}^{0}\) meson mixing is characterised by the mass difference, \(\Delta m_{s} \), of the heavy (\(B_{\mathrm H} \)) and light (\(B_{\mathrm L} \)) mass eigenstates. The CP-violating phase \(\phi _{s} \) is defined as the weak phase difference between the \(B_{s}^{0} \)\(\bar{B}^{0}_{s}\) mixing amplitude and the \(b \rightarrow c \overline{c} s\) decay amplitude. In the SM the phase \(\phi _{s} \) is small and is related to the Cabibbo–Kobayashi–Maskawa (CKM) quark mixing matrix elements via the relation \(\phi _{s} \simeq -2 \beta _{s}\), with \(\beta _{s} = \mathrm {arg} [- (V_{ts} V^{*}_{tb})/(V_{cs} V_{cb}^{*}) ]\). By combining beauty and kaon physics observables, and assuming no NP contributions to \(B_{s}^{0}\) mixing and decays, a value of \(-2 \beta _{s} = -0.03696^{+0.00072}_{-0.00082}\) rad was predicted by the CKMFitter group [2] and \(-2 \beta _{s} = -0.03700 \pm 0.00104\) rad according to the UTfit Collaboration [3]. While large NP enhancements of the mixing amplitude have been excluded by the precise measurement of the oscillation frequency [4], the NP couplings involved in the mixing may still increase the size of the observed CP violation by enhancing the mixing phase \(\phi _{s} \) with respect to the SM value.

Other physical quantities involved in \(B_{s}^{0} \)\(\bar{B}^{0}_{s}\) mixing are the decay width \( \Gamma _{s}=(\Gamma _{\mathrm L} + \Gamma _{\mathrm H} )/2\) and the width difference \( \Delta \Gamma _{s}= \Gamma _{\mathrm L} - \Gamma _{\mathrm H} \), where \(\Gamma _{\mathrm L} \) and \(\Gamma _{\mathrm H} \) are the decay widths of the light and heavy mass eigenstates, respectively. The latest predictions for the width difference in the SM are \( \Delta \Gamma _{s}= 0.091 \pm 0.013\) ps\(^{-1}\) [5] and \( \Delta \Gamma _{s}= 0.092 \pm 0.014\) ps\(^{-1}\) [6]. A potential NP enhancement of \(\phi _{s} \) would also decrease the size of \( \Delta \Gamma _{s}\), but it is not expected to be affected as significantly as \(\phi _{s} \) [7]. Nevertheless, extracting \( \Delta \Gamma _{s}\) from the data is an important test of theoretical predictions [7].

Theory predictions have been made for the lifetime ratios \(\tau \)(\(B_{s}^{0}\))/\(\tau (B_d)\) and \(\tau \)(\(B_{s}^{0}\))/\(\tau (B^+)\), with the latest update Ref. [8]. The lifetime \(\tau \)(\(B_{s}^{0}\)) has not been calculated in theory yet at a precision comparable with those obtained by experiments. The current world combined value of the decay width, \( \Gamma _{s}\), obtained from experimental results is \( \Gamma _{s}= 0.6600\pm 0.0016\) ps\(^{-1}\) [9].

The analysis of the time evolution of the \( B_{s}^{0} \rightarrow J/\psi \phi \) decay provides the most precise determination of \(\phi _{s} \) and \( \Delta \Gamma _{s}\). Previous measurements of these quantities have been reported by the D0, CDF, LHCb, ATLAS and CMS Collaborations [10,11,12,13,14,15,16,17]. Additional improvements in measuring \(\phi _{s} \) from \(B_{s}^{0} \rightarrow \psi (2S) \phi \), \(B_{s}^{0} \rightarrow D_{s} ^+D_{s} ^-\) and \(B_{s}^{0} \rightarrow J/\psi \pi ^+\pi ^-\) decays have been achieved by the LHCb Collaboration [18,19,20,21].

The analysis presented here introduces a measurement of the \( B_{s}^{0} \rightarrow J/\psi \phi \) decay parameters using \( 80.5\, \mathrm {fb^{-1}} \) of the LHC proton–proton (pp) data collected by the ATLAS detector during 2015–2017, at a centre-of-mass energy, \(\sqrt{s}\), equal to 13 \(\text {Te}\text {V}\). The analysis closely follows a previous ATLAS measurement [13] that was performed using 19.2 \(\mathrm {fb}^{-1}\) of the data collected at 7 and 8 \(\text {Te}\text {V}\), and introduces more precise signal and background models.

2 ATLAS detector and Monte Carlo simulation

The ATLAS detectorFootnote 1 consists of three main components: an inner detector (ID) tracking system immersed in a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and a muon spectrometer (MS). The inner tracking detector covers the pseudorapidity range \(|\eta | < 2.5\), and consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. The ID is surrounded by a high-granularity liquid-argon (LAr) sampling electromagnetic calorimeter. A steel/scintillator tile calorimeter provides hadronic coverage in the central rapidity range. The endcap and forward regions are equipped with LAr calorimeters for electromagnetic and hadronic measurements. The MS surrounds the calorimeters and provides a system of tracking chambers and detectors for triggering. A full description can be found in Refs. [22,23,24].

The data were collected during periods with different instantaneous luminosity, so several triggers were used in the analysis. All triggers were based on the identification of a \( J/\psi \rightarrow \mu ^{+} \mu ^{-} \) decay, with transverse momentum (\(p_{\mathrm{T}}\)) thresholds of either 4 \(\text {Ge}\text {V}\) or 6 \(\text {Ge}\text {V}\) for the muons. Data quality requirements are imposed on the data, notably on the performance of the MS, ID and calorimeter systems. The measurement uses \( 80.5\, \mathrm {fb^{-1}} \) of pp collision data. The uncertainty in the combined 2015–2017 integrated luminosity is 2.0% [25], obtained using the LUCID-2 detector [26] for the primary luminosity measurements.

To study the detector response, estimate backgrounds, and model systematic effects, 100M Monte Carlo (MC) simulated \( B_{s}^{0} \rightarrow J/\psi \phi \) events were generated using Pythia 8.210 [27] tuned with ATLAS data, using the A14 set of parameter values [28] together with the CTEQ6L1 set of parton distribution functions [29]. The detector response was simulated using the ATLAS simulation framework based on Geant4 [30, 31]. In order to account for the varying number of proton–proton interactions per bunch crossing (pile-up) and trigger configurations during data-taking, the MC events were weighted to reproduce the same pile-up and trigger conditions as in the data. Additionally, background samples of both the exclusive (\(B_{d}^{0} \rightarrow J/\psi K^{0*}\) and \(\Lambda _b \rightarrow J/\psi p K^{-}\)) and inclusive (\(b\bar{b} \rightarrow J/\psi X\) and \(pp \rightarrow J/\psi X\)) decays were simulated, using the same simulation tools as in the case of the signal events. For validation studies related to flavour tagging, detailed in Sect. 4, events with \(B^{\pm }\rightarrow J/\psi K^{\pm }\) exclusive decays were also simulated.

3 Reconstruction and candidate selection

The reconstruction and candidate selection for the decay \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) is described here. Events must pass the trigger selections described in Sect. 2. In addition, each event must contain at least one reconstructed primary vertex, formed from at least four ID tracks, and at least one pair of oppositely charged muon candidates that are reconstructed using information from the MS and the ID. The muons used in the analysis are required to meet the TightFootnote 2 or Low-\({ p_{\mathrm{T}} }\)Footnote 3 working point identification criteria. The muon track parameters are determined from the ID measurement alone, since the precision of the measured track parameters is dominated by the ID track reconstruction in the \(p_{\mathrm{T}}\) range of interest for this analysis. Pairs of oppositely charged muon tracks are re-fitted to a common vertex and the pair is accepted if the quality of the fit meets the requirement \(\mathrm \chi ^{2}\)/ndof \(< 10\). In order to account for varying mass resolution in different parts of the detector, the \(J/\psi \) candidates are divided into three subsets according to the pseudorapidity \(\eta \) of the muons. In the first subset, both muons have \(|\eta | < 1.05\), where the values \(\eta = \pm 1.05\) correspond to the edges of the barrel part of the MS. In the second subset, one muon has \(1.05< |\eta | < 2.5 \) and the other muon \(|\eta | < 1.05\). The third subset contains candidates where both muons have \(1.05< |\eta | < 2.5 \). A maximum likelihood fit is used to extract the \(J/\psi \) mass and the corresponding mass resolution for these three subsets, and in each case the signal region is defined symmetrically around the fitted mass, so as to retain 99.7% of the \(J/\psi \) candidates identified in the fits.

The candidates for the decay \( \phi \rightarrow K^{+} K^{-} \) are reconstructed from all pairs of oppositely charged tracks, with \(p_{\mathrm{T}} > 1\) \(\text {Ge}\text {V}\) and \(|\eta | < 2.5\), that are not identified as muons. Candidate events for \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) decays are selected by fitting the tracks for each combination of \( J/\psi \rightarrow \mu ^{+} \mu ^{-} \) and \( \phi \rightarrow K^{+} K^{-} \) to a common vertex. The fit is also constrained by fixing the invariant mass calculated from the two muon tracks to the \(J/\psi \) mass [33]. A quadruplet of tracks is accepted for further analysis if the vertex fit has \(\mathrm \chi ^{2}\)/ndof \(< 3\). For the \( \phi \rightarrow K^{+} K^{-} \) candidate, the invariant mass of the track pairs (using a charged kaon mass hypothesis) must fall within the interval \( 1.0085 \; {\text {Ge}\text {V}}< m(K^+ K^-) < 1.0305 \; {\text {Ge}\text {V}} \). The interval, chosen using MC simulation, is selected to retain 98% of true \( \phi \rightarrow K^{+} K^{-} \) decays. The \(B_{s}^{0}\) candidate with the lowest \(\chi ^2 /\)ndof is selected in events where more than one candidate passes all selections. In total, 2 977 526 \(B_{s}^{0}\) candidates are collected within the mass range of 5.150–5.650 \(\text {Ge}\text {V}\). This range is chosen to give enough background events in the sidebands of the mass distributions to allow precise determination of the properties of the background events.

The mean number of interactions per bunch crossing is 30, necessitating a choice of the best candidate for the primary vertex at which the \(B_{s}^{0}\) meson is produced. Primary vertex positions are recalculated after removing any tracks used in the \(B_{s}^{0}\) meson reconstruction. The variable used to select the best candidate for the primary vertex is the three-dimensional impact parameter, \(a_0\), which is calculated as the minimum distance between each primary vertex candidate and the line extrapolated from the reconstructed \(B_{s}^{0}\) meson vertex in the direction of the \(B_{s}^{0}\) momentum. The chosen primary vertex is the one with the smallest \(a_0\). A simulated dataset is used to estimate the fraction of \(B_{s}^{0}\) candidates where the incorrect production vertex is selected (\(12\%\)) and demonstrates that the mis-selection of reconstructed primary vertex does not bias the reconstructed proper decay time.

For each \(B_{s}^{0}\) meson candidate the proper decay time t is estimated using:

$$\begin{aligned} t = \frac{ L_{xy}\ m_{{}_{B}} }{ p_{ \mathrm {T}_{B} }}, \end{aligned}$$

where \(p_{ \mathrm {T}_{B} }\) is the reconstructed transverse momentum of the \(B_{s}^{0}\) meson candidate and \(m_{{}_{B}}\) denotes the mass of the \(B_{s}^{0}\) meson, taken from Ref. [33]. The transverse decay length, \(L_{xy}\), is the displacement in the transverse plane of the \(B_{s}^{0}\) meson decay vertex relative to the primary vertex, projected onto the direction of the \(B_{s}^{0}\) transverse momentum.

4 Flavour tagging

To identify, or tag, the flavour of a neutral B meson at the point of production, information is extracted using the decay of the other (or opposite) b-hadron that is produced from the pair production of b and \(\bar{b}\) quarks. This method is called opposite-side tagging (OST).

The OST algorithms each define a discriminating variable, based on charge information, which is sensitive to the flavour (i.e. b- or \(\bar{b}\)-quark) of the opposite-side b-hadron. The algorithms thus provide a probability that a signal B meson in a given event is produced in a given flavour. The calibration of the OST algorithms proceeds using data containing \(B^{\pm }\rightarrow J/\psi K^{\pm }\) candidate decays, where the charge of the kaon determines the flavour of the B meson, providing a self-tagging sample of events. These OST algorithms are calibrated as a function of the discriminating variable, using yields of signal \(B^{\pm }\) mesons extracted from fits to the data. Once calibrated, the OST algorithms are applied to \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) candidate events to provide a probability that each candidate was produced in a \(B_{s}^{0} \) or \(\bar{B}^{0}_{s}\) meson state, which is used in the maximum likelihood fit (described in Sect. 5). This approach assumes invariance of the OST algorithm with respect to the specific signal b-hadron type (i.e \(B^{\pm }\) meson or \(B_{s}^{0} \) meson), which is tested and the difference is considered as a systematic uncertainty.

4.1 \(B^\pm \rightarrow J/\psi K^\pm \) event selection

Candidate \(B^{\pm }\rightarrow J/\psi K^{\pm }\) decays are identified in a series of steps. First, \(J/\psi \) candidates are selected from oppositely charged muon pairs forming a good vertex, as described in Sect. 3. Each muon is required to have \(p_{\mathrm{T}} > 4\) \(\text {Ge}\text {V}\) and \(|\eta | < 2.5\). Dimuon candidates with invariant mass \(2.8< m(\mu ^+\mu ^-) < 3.4\) \(\text {Ge}\text {V}\), as determined from the re-fitted track parameters of the vertex, are retained for further analysis. To form the \(B^{\pm }\) candidate, an additional track is required, which is not identified as an electron or muon. The track is assigned the charged-kaon mass hypothesis and combined with the dimuon candidate using a vertex fit, performed with the mass of the dimuon pair constrained to the \(J/\psi \) mass. Prompt background contributions are suppressed by a requirement on the proper decay time of the \(B^{\pm }\) candidate of \(t > 0.2\) ps.

The tagging probabilities are determined from \(B^{+}\) and \(B^{-}\) signal events. These signal yields are derived from fits to the invariant mass distribution, \(m(J/\psi K^{\pm })\), and performed in intervals of the discriminating variables. To describe the \(B^{\pm }\rightarrow J/\psi K^{\pm }\) signal, two Gaussian functions with a common mean are used. An exponential function is used to describe the combinatorial background and a hyperbolic tangent function to parameterise the low-mass contribution from incorrectly or partially reconstructed b-hadron decays. A Gaussian function is used to describe the \(B^{\pm }\rightarrow J/\psi \pi ^{\pm }\) contribution, with fixed parameters taken from simulation except for the normalisation, which is a free parameter. A fit to the overall mass distribution is used to define the shapes of signal and backgrounds. Subsequent fits are performed in the intervals of the tagging discriminating variables, separately for \(B^{+}\) and \(B^{-}\) candidate events, with the normalisations and also the slope of the exponential function left free. The \(B^{+}\) and \(B^{-}\) signal yields are extracted from these fits. Figure 1 shows the invariant mass distribution of \(B^{\pm }\) candidates overlaid with a fit to all selected candidates, and including the individual fit components for the signal and backgrounds.

Fig. 1
figure 1

The invariant mass distribution for selected \(B^{\pm }\rightarrow J/\psi K^{\pm }\) candidates. Data are shown as points, and the overall result of the fit is given by the blue curve. The contributions from the combinatorial background component are indicated by the red dotted line, partially reconstructed b-hadron decays by the purple shaded area, and decays of \(B^{\pm }\rightarrow J/\psi \pi ^{\pm }\), where the pion is misassigned as a kaon, by the green dashed line

4.2 Flavour tagging methods

The flavour of the signal B meson at the point of production is inferred using several methods, which differ in their efficiency and discrimination power. The measured charge of a lepton (electron or muon) from the semileptonic decay of a B meson provides strong discrimination; however, the ATLAS sensitivity to \(b\rightarrow \ell \) transitions are diluted through processes that can change the charge of the observed lepton, such as through neutral B meson oscillations, or through the cascade decays \(b\rightarrow c\rightarrow \ell \). The separation power of lepton tagging is enhanced by considering a weighted sum of the charge of the tracks in a cone around the lepton. If no lepton is present, a weighted sum of the charge of the tracks in a jet associated with the opposite-side b-hadron decay is used to provide discrimination. This weighted sum, or cone charge, is defined as:

$$\begin{aligned} Q_{x} = \frac{ \sum ^{N\;\mathrm {tracks}}_{i} q_{i} \cdot (p_{\mathrm {T} i })^{\kappa }}{ \sum ^{N\;\mathrm {tracks}}_{i}(p_{\mathrm {T} i })^{\kappa }}, \end{aligned}$$
(1)

where \(x=\{\mu ,e,\mathrm {jet}\}\) refers to muon, electron, or jet charge, respectively, and the summation is made using the charge of the track, \(q_{i}\), and its \(p_{\mathrm {T} i }\), over a selected set of tracks, including the lepton, in a cone of size \(\Delta R = \sqrt{(\Delta \phi )^2+(\Delta \eta )^2}\), around the lepton or jet direction. The value of the parameter \(\kappa \) is optimised on each OST method, by determining the value of \(\kappa \) that maximises the tagging power (defined in Sect. 4.3). The requirements on the tracks and \(\Delta R\) are described below, dependent on the OST method.

Two subcategories of \(Q_{x}\) are considered: the first discrete category is used in the case where the cone charge is formed either from only one track or from more than one track of the same charge; this results in a cone charge of \(Q_{x}=\pm 1\). The second continuous category is used when more than one track is considered, and the sum contains tracks of both negative and positive charge. In the continuous case, \(Q_{x}\) is divided into intervals within the range \(-1<Q_{x}<1\) for each OST algorithm.

A probability \(P(B|Q_{x})\) is constructed, which is defined as the probability that a B meson is produced in a state containing a \({\bar{b}}\)-quark, given the value of the cone charge \(Q_{x}\). Since \(Q_{x}\) is evaluated on the opposite side, a large, negative value of \(Q_{x}\) tends to correspond to a higher value of \(P(B|Q_{x})\). An equivalent probability for the b-quark case is defined as \(P({\bar{B}}|Q_{x})\). Using the \(B^{\pm }\) calibration samples, \(P(Q_{x}|B^{\pm })\) for each tagging method used can be defined. The probability to tag a \(B_{s}^{0} \) meson as containing a \({\bar{b}}\)-quark is therefore given as \(P(B|Q_{x}) = P(Q_{x}|B^{+}) / (P(Q_{x}|B^{+}) + P(Q_{x}|B^{-}))\), and correspondingly \(P({\bar{B}}|Q_{x}) = 1-P(B|Q_{x})\). If there is no OST information available for a given \(B_{s}^{0} \) meson, a probability of 0.5 is assigned to that candidate.

4.2.1 Muon tagging

For muon-based tagging, at least one additional muon is required in the event, with \(p_{\mathrm{T}} >2.5\) \(\text {Ge}\text {V}\), \(|\eta |<2.5\) and \(|\Delta z|<5\) mm, where \(|\Delta z|\) is the difference in z between the primary vertex and the longitudinal impact parameter of the ID track associated with the muon. Muons are classified and kept if their identification quality selection working point is either Tight or Low-\(p_{\mathrm{T}} \); these categories are subsequently treated as distinct flavour tagging methods. For muons with \(p_{\mathrm{T}} >4\) \(\text {Ge}\text {V}\), Tight muons are the dominant category, with the Low-\({ p_{\mathrm{T}} }\) requirement typically identifying muons of \(p_{\mathrm{T}} <4\) \(\text {Ge}\text {V}\). In the case of multiple muons satisfying selection criteria in one event, Tight muons are chosen over Low-\(p_{\mathrm{T}} \) muons. Within the same muon category, the muon with the highest \(p_{\mathrm{T}} \) that passes the selections is used.

A muon cone charge variable, \(Q_{\mu }\), is constructed according to Eq. (1), with \(\kappa =1.1\) and the sum over the reconstructed ID tracks within a cone of size \(\Delta R=0.5\) around the muon direction. These tracks must have \(p_{\mathrm{T}} >0.5\) \(\text {Ge}\text {V}\), \(|\eta |<2.5\), and \(|\Delta z|<5\) mm. Tracks associated with the decay of a B meson signal candidate are excluded from the sum. In each interval of \(Q_{\mu }\), a fit to the \(J/\psi K^{\pm }\) invariant mass spectrum is performed and the number of signal events extracted. The fit model used is described in Sect. 4.1. Figure 2 shows the distributions of the muon cone charge using \(B^{\pm }\) signal candidates for Tight muons, and includes the tagging probability as a function of the cone charge variable. The corresponding distributions for Low-\(p_{\mathrm{T}} \) muons are shown in Fig. 3.

Fig. 2
figure 2

Cone charge distributions, \(-Q_{\mu }\), for Tight muons, shown for cases of discrete charge (left), and for the continuous distribution (right). For each plot, in red (blue), the normalised \(B^{+}\) (\(B^{-}\)) cone charge distribution is shown (corresponding to the right axis scale). A \(B^{+}\) (\(B^{-}\)) candidate is more likely to have a large negative (positive) value of \(Q_{\mu }\). Superimposed is the distribution of the tagging probability, \(P(B|Q_{\mu })\), as a function of the cone charge, derived from a data sample of \(B^{\pm }\rightarrow J/\psi K^{\pm }\) decays, and defined as the probability to have a \(B^{+}\) meson (on the signal-side) given a particular cone charge \(Q_{\mu }\). The fitted parameterisation, shown in black, is used as the calibration curve to infer the probability to have a \(B_{s}^{0} \) or \(\bar{B}^{0}_{s}\) meson at production in the decays to \( J/\psi \phi \)

Fig. 3
figure 3

Normalised cone charge distributions (shown against the right axis scale), \(-Q_{\mu }\), for \(B^{+}\) (\(B^{-}\)) events shown in red (blue) for Low-\(p_{\mathrm{T}} \) muons, for cases of discrete charge (left), and for the continuous distribution (right). Superimposed is the distribution of the tagging probability, \(P(B|Q_{\mu })\)

4.2.2 Electron tagging

Electrons are identified using ID and calorimeter information, and must satisfy the Medium electron quality criteria [34]. The ID track associated with the electron is required to have \(p_{\mathrm{T}} >0.5\) \(\text {Ge}\text {V}\), \(|\eta |<2.5\), and \(|\Delta z|<5\) mm. To reject electrons from the signal-side of the decay, electrons with \(\cos (\zeta _{b})>0.93\), where \(\zeta _{b}\) is the opening angle between the momentum of the signal B meson candidate and the electron momentum, are not considered. In the case of more than one electron passing the selection, the electron with the highest \(p_{\mathrm{T}} \) is chosen. Charged particle tracks within a cone of size \(\Delta R = 0.5\) are used to form the electron cone charge \(Q_{e}\), constructed according to Eq. (1), with \(\kappa =1.0\). The resulting electron cone charge distributions are shown in Fig. 4, together with the corresponding tagging probability.

Fig. 4
figure 4

Normalised cone charge distributions (shown against the right axis scale), \(-Q_{e}\), for \(B^{+}\) (\(B^{-}\)) events shown in red (blue) for electrons, for cases of discrete charge (left), and the continuous distribution (right). Superimposed is the distribution of the tagging probabilities, \(P(B|Q_{e})\)

4.2.3 Jet tagging

In the absence of a muon or electron, a jet identified as containing a b-hadron is required. Jets are reconstructed from calorimetric information [35] using the anti-\({k_t}\) algorithm [36, 37] with a radius parameter \(R = 0.4\). The identification of a b-tagged jet uses a multivariate algorithm MV2c10 [38], utilising boosted decision trees (BDT), which output a classifier value. Jets are selected if this value exceeds 0.56. This value is chosen to maximise the tagging power of the calibration sample. In the case of multiple selected jets, the jet with the highest value of the BDT output classifier is used. Jets associated with the signal decay are not considered in this selection.

Tracks within a cone of size \(\Delta R = 0.5\) around the jet axis are used to define a jet cone charge, \(Q_{\mathrm {jet}}\), constructed according to Eq. (1), where \(\kappa =1.1\) and the sum is over the tracks associated with the jet, with \(|\Delta z|<5\) mm, and excluding tracks from the decay of the signal B meson candidate. Figure 5 shows the distribution of the opposite-side jet cone charge for \(B^{\pm }\) signal candidates.

Fig. 5
figure 5

Normalised cone charge distributions (shown against the right axis scale), \(-Q_{\mathrm {jet}}\), for \(B^{+}\) (\(B^{-}\)) events shown in red (blue) for jets, for cases of discrete charge (left), and the continuous distribution (right). Superimposed is the distribution of the tag probability, \(P(B|Q_{\mathrm {jet}})\)

4.3 Flavour tagging performance

In order to quantify and compare the performance of the various tagging methods, three figure-of-merit terms are constructed, which describe: the fraction of events used by a given tagging method, the purity of the method, and the overall power of the tagging method in the sample. The efficiency, \(\epsilon _{x}\), of an individual tagging method is defined as the number of signal events tagged by that method divided by the total number of signal events in the sample. The purity of a particular flavour tagging method, called the dilution, is defined as \({\mathcal D}(Q_{x}) = 2 P({B}|Q_{x}) - 1\). The tagging power of a particular tagging method is then defined as \(T_{x}= \sum _{i} \epsilon _{x\,i} \cdot {\mathcal D}^2(Q_{x\,i})\), where the sum is over the probability distribution in intervals of the cone charge variable. An effective dilution, \(D_{x}=\sqrt{T_{x}/\epsilon _{x}}\), is calculated from the measured tagging power and efficiency.

By definition, there is no overlap between lepton-tagged and jet-charge-tagged events. The overlap between events with a muon (either Tight or Low-\(p_{\mathrm{T}} \)) and events with an electron corresponds to around 0.6% of all tagged events. In the case of multiply tagged events, the OST method is selected in the following order: Tight muon, electron, Low-\(p_{\mathrm{T}} \) muon, jet. However, the ordering of muon- and electron-tagged events is shown to have negligible impact on the final results. A summary of the tagging performance for each method and the overall performance on the \(B^{\pm }\) sample is given in Table 1.

4.4 Using tag information in the \(B^0_s\) fit

For the maximum likelihood fit performed on the \(B_{s}^{0} \) data, and described in detail in Sect. 5, the per-candidate probability, \(P(B|Q_{x})\), that the B meson candidate was produced in a state \(B_{s}^{0} \) (versus a \(\bar{B}^{0}_{s}\)) is provided by the calibrations derived from the \(B^{\pm }\rightarrow J/\psi K^{\pm }\) sample, described above, and shown in Figs. 2, 3, 4 and 5. Since the distributions of \(P(B|Q_{x})\) from signal \(B_{s}^{0} \) mesons and backgrounds can be expected to be different, separate probability density functions (PDFs) are necessary to describe these distributions in the likelihood function. These PDFs are defined as \(P_{\text {s}}(P(B|Q_{x}))\) and \(P_{\text {b}}(P(B|Q_{x}))\), describing the probability distributions for signal and background, respectively, and are derived from the sample of \(B_{s}^{0} \) candidates. For the exclusive decays \(B_d \rightarrow J/\psi K^{0*}\) and \(\Lambda _b \rightarrow J/\psi p K^{-}\) that are present in the sample of \(B_{s}^{0} \) candidates, \(P_{\text {s}}(P(B|Q_{x}))\) is used to model the probability distributions for these contributions (described further in Sect. 5.2). The PDFs consist of the fraction of events that are tagged with a particular method (or are untagged), the fractions of those events categorised as discrete or continuous, and for those that are continuous, a PDF of the corresponding probability distribution.

4.4.1 Continuous PDF

The parameterisations of the continuous PDF components of \(P_{\text {s,b}}(P(B|Q_{x}))\) for each OST method are defined as follows. In the sideband regions, \(5.150< m(J/\psi KK) <5.317\) \(\text {Ge}\text {V}\) and \(5.417< m(J/\psi KK) < 5.650\) \(\text {Ge}\text {V}\), unbinned maximum likelihood fits to the \(P(B|Q_x)\) distributions are performed to extract the background (continuous category) PDFs for \(P_{\text {b}}(P(B|Q_{x}))\). For the Tight muon and electron methods, the parameterisation has the form of the sum of a second-order polynomial and two exponential functions. A Gaussian function is used for the Low-\(p_{\mathrm {T}}\) muons. For the jet tagging algorithm an eighth-order polynomial is used.

For the signal, fits are performed to the \(P(B|Q_{x})\) distributions, using all events in the \(m(J/\psi KK)\) distributions to extract the signal (continuous category) PDFs for \(P_{\text {s}}(P(B|Q_{x}))\). In these fits, the parameters describing the background PDFs are fixed to their previously extracted values, as is the relative normalisation of signal and background, extracted from a fit to the \(m(J/\psi KK)\) distribution. For the signal PDFs, the Tight muon tagging method uses the sum of two exponential functions and a constant function to describe the signal. For the electron tagging method, the signal function has the form of the sum of a second-order polynomial and two exponential functions, and for the Low-\(p_{\mathrm {T}}\) muon and jet tagging methods a Gaussian function is used.

Table 1 Summary of tagging performances for the different flavour tagging methods on the sample of \(B^{\pm }\) signal candidates, as described in the text. Uncertainties shown are statistical only. The efficiency (\(\epsilon _{x}\)) and tagging power (\(T_{x}\)) are each determined by summing over the individual bins of the cone charge distribution. The effective dilution (\(D_{x}\)) is obtained from the measured efficiency and tagging power. For the efficiency, effective dilution, and tagging power, the corresponding uncertainty is determined by combining the appropriate uncertainties in the individual bins of each charge distribution

4.4.2 Discrete PDF

In the case where the cone charge is discrete, the fractions of events \(f_{+1}\) (\(f_{-1}\)) with cone charges \(+1\) (\(-1\)) are determined separately for signal and background using events from the signal and sideband regions of the \(B_{s}^{0}\) mass distribution (as defined in Sect. 3). The remaining fraction of events, \(1 - f_{+1} - f_{-1}\), corresponds to the continuous parts of the distribution. Positive and negative charges are equally probable for background candidates formed from a random combination of a \(J/\psi \) and a pair of tracks, but this is not necessarily the case for background candidates formed from a partially reconstructed b-hadron. Table 2 summarises the fractions \(f_{+1}\) and \(f_{-1}\) obtained from each tagging method for signal and background events.

Table 2 Fractions \(f_{+1}\) and \(f_{-1}\) of events with cone charges of \(+1\) and \(-1\), respectively, for signal and background events and for the different tagging methods. Only statistical uncertainties are given

The fractions of signal and background events tagged using the different OST methods are found using a similar sideband-subtraction method, and are summarised in Table 3.

To account for possible deviations of the data from the selected fit models, variations of the procedure described here are used to determine systematic uncertainties, as described in Sect. 6.

5 Maximum likelihood fit

An unbinned maximum likelihood fit is performed on the selected events to extract the parameter values of the \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) decay. The fit uses information about the reconstructed mass, m, the measured proper decay time, t, the measured mass uncertainty, \(\sigma _m\), the measured proper decay time uncertainty, \(\sigma _t\), the measured transverse momentum, \(p_{\mathrm{T}}\), the tagging probability, \(P(B|Q_{x})\), and the transversity angles, \(\Omega \), of each \( B_{s}^{0} \rightarrow J/\psi \phi \) decay candidate. The measured value of the proper decay time uncertainty, \(\sigma _t\), is calculated from the covariance matrix associated with the vertex fit for each candidate event. The transversity angles \(\Omega = (\theta _T,\psi _T,\phi _T)\) are defined in Sect. 5.1. The likelihood function is defined as a combination of the signal and background PDFs as follows:

$$\begin{aligned} \ln ~\mathcal{L}&= \sum _{i=1}^N w_i \cdot \mathrm {ln} [ f_{\text {s}} \cdot \mathcal{F}_{\text {s}}( m_{i}, t_{i}, \sigma _{m_{i}}, \sigma _{t_{i}}, \Omega _i , P_{i}(B|Q_{x}), { p _{\mathrm {T}_{i}}} )\nonumber \\&\quad + f_{\text {s}} \cdot f_{B^0} \cdot \mathcal{F}_{B^0} ( m_{i}, t_{i}, \sigma _{m_{i}}, \sigma _{t_{i}}, \Omega _i , P_{i}(B|Q_{x}), {{ p }_{\mathrm {T}_{i}}} ) \nonumber \\&\quad + f_{\text {s}} \cdot f_{\Lambda _b} \cdot \mathcal{F}_{\Lambda _b} ( m_{i}, t_{i}, \sigma _{m_{i}}, \sigma _{t_{i}}, \Omega _i , P_{i}(B|Q_{x}), {{ p }_{\mathrm {T}_{i}}} ) \nonumber \\&\quad + ( 1 - f_{\text {s}} \cdot (1 + f_{B^0} + f_{\Lambda _b}) ) \mathcal{F}_{\text {bkg}} ( m_{i}, t_{i}, \sigma _{m_{i}}, \sigma _{t_{i}}, \nonumber \\&\qquad \Omega _i, P_{i}(B|Q_{x}), { p _{\mathrm {T}_{i}}} ) ], \end{aligned}$$
(2)

where N is the number of selected candidates, \(w_i\) is a weighting factor to account for the trigger efficiency (described in Sect. 5.3). The terms \({ \mathcal F}_{\text {s} }\), \(\mathcal{F}_{B^0}\), \(\mathcal{F}_{\Lambda _b}\) and \(\mathcal{F}_{\text {bkg} }\) are the PDFs modelling the signal, \(B^0\) background, \(\Lambda _b\) background, and the other background distributions, respectively. The term \(f_{\text {s}}\) is the fraction of signal candidates and \(f_{B^0}\) and \(f_{\Lambda _b}\) are the background fractions of \(B^0\) mesons and \(\Lambda _b\) baryons misidentified as \(B_{s}^{0}\) candidates, calculated relative to the number of signal events. These background fractions are fixed to their expectation from the MC simulation, and variations are applied as part of the evaluation of the effects of systematic uncertainties. The mass \(m_{i}\), the proper decay time \(t_{i}\) and the decay angles \(\Omega _i \) are the values measured from the data for each event i. A detailed description of the signal PDF terms in Eq. (2) is given in Sect. 5.1. The three background functions are described in Sect. 5.2.

Table 3 Fractions of signal and background events tagged using the different methods. The efficiencies include both the continuous and discrete contributions. Only statistical uncertainties are quoted
Table 4 The ten time-dependent functions, \(\mathcal{O}^{(k)}(t)\) and the functions of the transversity angles \(g^{(k)}(\theta _T,\psi _T,\phi _T)\). The amplitudes \(|A_0(0)|^2\) and \(|A_\parallel (0)|^2\) are for the CP-even components of the \( B_{s}^{0} \rightarrow J/\psi \phi \) decay, \(|A_{\perp }(0)|^2\) is the CP-odd amplitude; they have corresponding strong phases \(\delta _{0}\), \(\delta _{\parallel }\) and \(\delta _{\perp }\). By convention, \(\delta _0\) is set to be zero. The S-wave amplitude \(|A_S(0)|^2\) gives the fraction of \( B_{s}^{0} \rightarrow J/\psi K^+K^- (f_{0}) \) and has a related strong phase \(\delta _S\). The factor \(\alpha \) is described in the text of Sect. 5.1. The ± and \({\mp }\) terms denote two cases: the upper sign describes the decay of a meson that was initially a \(B_{s}^{0}\) meson, while the lower sign describes the decays of a meson that was initially \(\bar{B}^{0}_{s}\)

5.1 Signal PDF

The PDF used to describe the signal events, \(\mathcal{F}_{\text {s}}\), has the following composition:

$$\begin{aligned}&\mathcal{F}_{\text {s}}( { m _{i}, t_{i},} \sigma _{m_{i}}, \sigma _{t_{i}}, \Omega _i, P_{i}(B|Q_{x}), { p _{\mathrm {T}_{i}}} ) \\&\quad = P_{\text {s}}(m_{i}|\sigma _{m_{i}}) \cdot P_{\text {s}}(\sigma _{m_{i}}|{{ p }_{\mathrm {T}_{i}}}) \cdot P_{\text {s}}(t_{i},\Omega _i|\sigma _{t_{i}},P_{i}(B|Q_{x})) \\&\qquad \cdot P_{\text {s}}(\sigma _{t_{i}}|{{ p }_{\mathrm {T}_{i}}}) \cdot P_{\text {s}}(P_{i}(B|Q_{x})) \cdot A(\Omega _i,{ p _{\mathrm {T}_{i}}}) \cdot P_{\text {s}}({ p _{\mathrm {T}_{i}}}). \end{aligned}$$

The mass term \(P_{\text {s}}(m_{i}|\sigma _{m_{i}})\) is modelled in the following way:

$$\begin{aligned} { P}_{\mathrm {s}} ( m _{i}|\sigma _{m_{i}} ) \equiv \frac{1}{ \sqrt{2\pi } S_m \sigma _{m_{i}} } \cdot \mathrm {e}^{ \frac{ -(m_i -m_{B_s})^{2} }{2( S_m \sigma _{m_{i}} )^{2} } }. \end{aligned}$$
(3)

The term \(P_{\text {s}}(m_{i}|\sigma _{m_{i}})\) uses per-candidate mass errors, \(\sigma _{m_{i}}\), calculated for each \( J/\psi \phi \) candidate from the covariance matrix associated with the four-track vertex fit. Each measured candidate mass is convolved with a Gaussian function with a width equal to \(\sigma _{m_{i}}\) multiplied by a scale factor \(S_m\), introduced to account for any mismeasurements. Both \(S_m\) and the mean value \(m_{B_s} \), which is the \(B_{s}^{0}\) meson mass, are free parameters determined in the fit.

The PDF term \(P_{\text {s}}(t_{i}, \Omega _i|\sigma _{t_{i}}, P_{i}(B|Q_{x}))\) takes into account the lifetime resolution, so each time element in Table 4 is convolved with a Gaussian function defined as:

$$\begin{aligned} {R}(t^{'}-t_{i}, \sigma _{t_{i}}) \equiv \frac{1}{\sqrt{2\pi }~S_{t}~\sigma _{t_{i}}}\cdot \mathrm {e}^{\frac{-(t^{'}_i-t_{i})^{2}}{2(S_{t} \sigma _{t_{i}})^{2}}}. \end{aligned}$$
(4)

\(S_{t}\) is a scale factor (a parameter of the fit) and \(\sigma _{t_{i}}\) is the per-candidate uncertainty on proper decay time \(t_{i}\). This convolution is performed numerically on an event-by-event basis and the value \(\sigma _{t_{i}}\) is measured for each \(B_{s}^{0}\) candidate, based on the tracking error matrix of the four final state particles. The probability term \(P_{\text {s}}(\sigma _{t_{i}}|{{ p }_{\mathrm {T}{i}}})\) is introduced to account for differences between signal and background events for the values of the per-candidate time errors. Distributions of this variable for signal and background described by gamma functions are shown in Fig. 6. The average value of the time error for signal events is 69 fs.

Fig. 6
figure 6

The proper decay time uncertainty distribution for data (black), and the fits to the background (blue) and the signal (purple) contributions. The total fit is shown as a red curve

The same approach was applied for the probability terms \(P_{\text {s}}(\sigma _{m_{i}}|{{ p }_{\mathrm {T}_{i}}})\) and \(P_{\text {s}}({{ p }_{\mathrm {T}i}})\) accounting for differences between signal and background events for the values of the per-candidate mass error and \({{ p }_{\mathrm {T}_{i}}}\) values, respectively. The tagging probability term for signal \(P_{\text {s}}(P_{i}(B|Q_{x}))\) is described in Sect. 4.4.

The term \(P_{\text {s}}(t_{i}, \Omega _i|\sigma _{t_{i}}, P_{i}(B|Q_{x})) \) is a joint PDF for the decay time t and the transversity angles \(\Omega \) for the \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) decay. Ignoring detector effects, the distribution for the time t and the angles \(\Omega \) is given by the differential decay rate [39]:

$$\begin{aligned} \frac{\mathrm {d}^4 \Gamma }{\mathrm {d}t\ \mathrm {d}\Omega }= \sum _{k=1}^{10}\mathcal{O}^{(k)}(t) g^{(k)}(\theta _T,\psi _T,\phi _T), \end{aligned}$$

where \(\mathcal{O}^{(k)}(t)\) are the time-dependent functions corresponding to the contributions of the four different amplitudes (\(A_0\), \(A_{||}\), \(A_\perp \), and \(A_S\)) and their interference terms, and \(g^{(k)}(\theta _T,\psi _T,\phi _T)\) are the angular functions. Table 4 shows the time-dependent and the angular functions of the transversity angles. The formulae for the time-dependent functions have the same structure for \(B_{s}^{0}\) and \(\bar{B}^{0}_{s}\) but with a sign reversal in the terms containing \(\Delta m_s\), which is a fixed parameter of the fit (using Ref. [33]). The formalism used throughout this analysis assumes no direct CP violation, i.e. \(\lambda = (q/p)(\bar{A}/A) = 1\). In Table 4, the parameter \(A_\perp (t)\) is the time-dependent amplitude for the CP-odd final-state configuration while \(A_0(t)\) and \(A_\parallel (t)\) correspond to CP-even final-state configurations. The amplitude \(A_S(t)\) gives the contribution from the CP-odd non-resonant \(B_s^0\rightarrow J/\psi K^+K^-\) S-wave state [40] (which includes the \(f_0\) meson). The corresponding functions are given in the last four lines of Table 4 (\(k=7\)–10). The amplitudes are parameterised by \(|A_i|\)e\(^{ i \delta _i }\), where \(i = \{ 0, ||, \perp , S \}\), \(\delta _0 = 0\) and are normalised such that \(|A_0(0)|^2 + |A_\perp (0)|^2 + |A_{\Vert }(0)|^2=1\). The amplitude \(|A_\perp (0)|\) is determined according to this condition, while the remaining three amplitudes are parameters of the fit. The value \(|A_S|^2\) is the ratio of the S-wave yield to the \(\phi \rightarrow K^+K^-\) yield in the interval of \(m(K^+K^-)\) used in the analysis. In the sum over the mass interval, the interference terms (lines 8–10 in Table 4) are corrected by a factor \(\alpha \) that takes into account the mass-dependent differences in absolute amplitude and phase between the \(\phi \rightarrow K^+K^-\) and the S-wave amplitudes. The correction is based on the Breit–Wigner description of the \(\phi \) and on model assumptions for the shape and the phase variations of the S-wave amplitude. The phase \(\delta _S\) is the phase difference between \(A_0(0)\) and the S-wave amplitudes at the \(\phi \rightarrow K^+K^- \) peak. The values of \(\alpha \) and the related systematic uncertainty are discussed in Sect. 6.

The angles (\(\theta _T,\psi _T,\phi _T\)), are defined in the rest frames of the final-state particles. The x-axis is determined by the direction of the \(\phi \) meson in the \(J/\psi \) rest frame, and the \(K^{+}K^{-}\) system defines the xy plane, where \(p_{y}(K^{+})>0\). The three angles are defined as:

  • \(\theta _T\), the angle between \(\vec {p}(\mu ^{+})\) and the normal to the xy plane, in the \(J/\psi \) meson rest frame,

  • \(\phi _T\), the angle between the x-axis and \(\vec {p}_{xy}(\mu ^{+})\), the projection of the \(\mu ^+\) momentum in the xy plane, in the \(J/\psi \) meson rest frame,

  • \(\psi _T\), the angle between \(\vec {p}(K^{+})\) and \(-\vec {p}(J/\psi )\) in the \(\phi \) meson rest frame.

The angular acceptance of the detector and the kinematic cuts on the angular distributions are included in the likelihood function through \(A(\Omega _i, {{ p }_{\text {T}i}})\). This is calculated using a four-dimensional binned acceptance method, applying an event-by-event efficiency according to the transversity angles (\(\theta _T,\psi _T,\phi _T\)) and the \(p_{\mathrm{T}}\) of the candidate. The \(p_{\mathrm{T}}\) binning is necessary, because the angular acceptance is influenced by the \(p_{\mathrm{T}}\) of the \(B_{s}^{0}\) candidate. The acceptance is calculated from the \( B_{s}^{0} \rightarrow J/\psi \phi \) MC events with additional weighting for \(p_{\mathrm{T}}\) and \(\eta \) distributions. In the likelihood function, the acceptance is treated as an angular acceptance PDF, which is multiplied with the time- and angle-dependent PDF describing the \( B_{s}^{0} \rightarrow J/\psi (\mu ^{+} \mu ^{-}) \phi (K^+K^-) \) decays. As both the acceptance and time- and angle-dependent decay PDFs depend on the transversity angles they must be normalised together. This normalisation is done numerically during the likelihood fit. The PDF is normalised over the entire \(B_{s}^{0}\) mass range, 5.150–5.650 \(\text {Ge}\text {V}\).

5.2 Background PDF

The background PDF has the following composition:

$$\begin{aligned}&\mathcal{F}_{\text {bkg}}( { m _{i}, t_{i}}, \sigma _{t_{i}}, \Omega _i, P_{i}(B|Q_{x}), { { p }_{\mathrm {T}_{i}}} ) \\&\quad = P_{\text {b}}(m_{i})\cdot P_{\text {b}}(t_{i}|\sigma _{t_{i}}) \cdot P_{\text {b}}(P_{i}(B|Q_{x})) \\&\qquad \cdot P_{\text {b}}(\Omega _i)\cdot P_{\text {b}}(\sigma _{m_{i}}|{{ p }_{\mathrm {T}_{i}}}) \cdot P_{\text {b}}(\sigma _{t_{i}}|{{ p }_{\mathrm {T}_{i}}}) \cdot P_{\text {b}}({{ p }_{\mathrm {T}_{i}}}). \end{aligned}$$

The proper decay time function \(P_{\text {b}}(t_{i}|\sigma _{t_{i}})\) is parameterised as a peak modelled by a Gaussian distribution, two positive-time exponential functions and two negative-time exponential functions. These functions are convolved with the same resolution function, defined in Eq. (4) as the signal decay time-dependence. The prompt peak models the combinatorial background events, which are expected to have reconstructed lifetimes distributed around zero. The two positive-time exponential functions represent a fraction of longer-lived backgrounds with non-prompt \(J/\psi \), combined with hadrons from the primary vertex or from a B/D meson in the same event. The two negative-time exponential functions take into account events with poor vertex resolution. The probability terms \(P_{\text {b}}(\sigma _{m_{i}}|{{ p }_{\mathrm {T}_{i}}})\), \(P_{\text {b}}(\sigma _{t_{i}}|{{ p }_{\mathrm {T}_{i}}})\) and \(P_{\text {b}}({{ p }_{\mathrm {T}_{i}}})\) are described by gamma functions. They are unchanged from the analysis described in Ref. [41] and explained in detail there. The tagging probability term for background events \(P_{\text {b}}(P_{i}(B|Q_{x}))\) is described in Sect. 4.4.

The shape of the background angular distribution, \(P_{\text {b}}(\Omega _i)\) arises primarily from detector and kinematic acceptance effects. The best description is achieved by Legendre polynomial functions:

$$\begin{aligned}&Y^m_l (\theta _T) = \sqrt{(2l+1)/(4\pi )}\sqrt{(l-m)!/(l+m)!}\ P^{|m|}_l(\cos \theta _T), \\&P_k(x) = \frac{1}{2^k k!} \frac{\mathrm {d}^k}{\mathrm {d}x^k}(x^2-1)^k, \\&P^{|m|}_l(x) = (-1)^{|m|}(1-x^2)^{|m|/2} \frac{\mathrm {d}^{|m|}}{\mathrm {d}x^{|m|}}(P_l(x)), \end{aligned}$$
$$\begin{aligned}&P_{\mathrm {b}}(\theta _T,\psi _T,\phi _T) \\&\quad = \sum _{k=0}^{k_{max}}\sum _{l=0}^{l_{max}}\sum _{m=-l}^{l} {\left\{ \begin{array}{ll} a_{k,l,m}\sqrt{2} Y^m_l(\theta _T) \cos (m\phi _T) P_k(\cos \psi _T) &{} \text {where } m > 0 \\ a_{k,l,m}\sqrt{2} Y^{-m}_l(\theta _T) \sin (m\phi _T) P_k(\cos \psi _T) &{} \text {where } m < 0 \\ a_{k,l,m}\sqrt{2} Y^{0}_l(\theta _T) P_k(\cos \psi _T) &{} \text {where } m = 0 \\ \end{array}\right. } \end{aligned}$$

where \(l_{max}=14\), \(k_{max}=14\) and the coefficients \(a_{k,l,m}\) are adjusted to give the best fit to the angular distributions for events in the sidebands of the \(B_{s}^{0}\) mass distribution. These parameters are then fixed in the main fit, defined by Eq. (2). The \(B_{s}^{0}\) mass interval used for the background fit is between 5.150 and 5.650 \(\text {Ge}\text {V}\) excluding the signal mass region \(|(m(B_s^0) -5.366|< 0.110\) \(\text {Ge}\text {V}\). Higher-order Legendre polynomial functions were tested as a systematic check, described in Sect. 6.

The background mass model, \(P_{\text {b}}({ m _{i}})\) is the sum of an exponential and a constant, with the exponential slope and the relative normalisation left as free parameters of the fit.

Contamination from \(B_d \rightarrow J/\psi K^{0*}\) and \(\Lambda _b \rightarrow J/\psi p K^{-}\) events misreconstructed as \( B_{s}^{0} \rightarrow J/\psi \phi \) is accounted for in the fit through the \(\mathcal{F}_{B^0}\) and \(\mathcal{F}_{\Lambda _b}\) terms in the PDF described in Eq. (2). The PDFs are determined using MC simulation of these decays. The fractions of these contributions, \(f_{B^0} = (4.3 \pm 0.5)\%\) and \(f_{\Lambda _b} = (2.1 \pm 0.6)\%\), are defined relative to the number of the \( B_{s}^{0} \rightarrow J/\psi \phi \) signal events and are evaluated from MC simulation using production cross sections and branching fractions from Refs. [33, 42,43,44,45,46].

MC simulated events are also used to determine the shape of the mass and transversity angle distributions. The 3D angular distributions of \(B_d^0\rightarrow J/\psi K^{0*}\) and of the conjugate decay are modelled using input from Ref. [47], while angular distributions for \(\Lambda _b \rightarrow J/\psi p K^{-}\) and the conjugate decay are considered flat. These distributions are sculpted for detector acceptance effects and then described by Legendre polynomial functions with \(l_{max}=10\) and \(k_{max}=10\), Eq. (5). These shapes are used as templates in the fit. The \(B_d\) and \(\Lambda _b\) lifetimes are accounted for in the fit by adding additional exponential terms, scaled by the ratio of \(B_d\)/\(B_s^0\) or \(\Lambda _b\)/\(B_s^0\) masses as appropriate, where the lifetimes and masses are taken from Ref. [33]. The PDF terms that describe each of the tagging, mass, decay time and \(p_{\mathrm{T}} \) probability distributions are taken from the same PDFs used to describe the \( B_{s}^{0} \rightarrow J/\psi \phi \) signal (described in Sects. 4.4 and 5.1) to account for the fact that these dedicated background events are fully reconstructed b-hadron decays. Systematic uncertainties due to the background from \(B_d \rightarrow J/\psi K^{0*}\) and \(\Lambda _b \rightarrow J/\psi p K^{-}\) decays are described in Sect. 6. The contribution of the S-wave \(B_d \rightarrow J/\psi K\pi \) decays as well as their interference with the P-wave \(B_d \rightarrow J/\psi K^{0*}\) decays are included in the PDF of the fit, using the parameters measured in Ref. [47].

5.3 Proper decay time dependence of muon trigger efficiency

In the triggers used in this analysis, there is no minimum cut applied on the transverse impact parameter \(d_0\) of muons. On the other side, trigger muons with values of \(d_0\) > 10 mm are not accepted. This results in inefficiency at large values of the proper decay time. This inefficiency is estimated using MC simulated events, by comparing the \(B_{s}^{0}\) proper decay time distribution obtained before and after applying the trigger selection. To account for this inefficiency in the fit, the events are reweighted by a factor w, inversely proportional to the trigger efficiency:

$$\begin{aligned} 1/w = p_0 \cdot [ 1 - p_1 \cdot (\text {Erf} ((t-p_3)/p_2) + 1)], \end{aligned}$$
(5)

where Erf denotes the error function and \(p_0, p_1, p_2\) and \(p_3\) are parameters determined in the fit to MC events. For more than 99% of the \(B_{s}^{0}\) candidates the inefficiency is below 2%. For the most affected candidates the inefficiency reaches up to 50% at high decay time. No significant bias or inefficiency due to offline track reconstruction, vertex reconstruction, or track quality selection criteria is observed.

5.4 Summary of the fit parameters

The joint PDF of proper decay time and decay angles includes the main physics parameters of interest:

  • the CP-violating phase \(\phi _{s} \),

  • the average decay width \( \Gamma _{s}\) and the decay width difference \( \Delta \Gamma _{s}\),

  • the size of the CP-state amplitudes at \(t=0\): \(|A_{\parallel }(0)|^2\), \(|A_{0}(0)|^2\) and their corresponding strong phases \(\delta _{\perp }\) and \(\delta _{\parallel }\)

  • and the size of the S-wave amplitude at \(t=0\): \(|A_{S}(0)|^2\) and corresponding strong phase \(\delta _{S}\).

The size of the remaining amplitude \(|A_{\perp }(0)|^2\) is constrained by the normalisation condition, phase \(\delta _{0}\) is set to zero and \(\Delta m_s\) is fixed as mentioned above.

The likelihood function also includes other parameters referred to as “nuisance parameters” such as: the \(B_{s}^{0}\) signal fraction \(f_s\), parameters describing the invariant mass and decay time-angular distributions of combinatorial background events and scale factors of mass and decay time uncertainties. In addition, there are also other nuisance parameters describing: acceptance functions, parametrisations of the angles of dedicated backgrounds \(B_d\rightarrow J/\psi K^{0*}\) and \(\Lambda _b\rightarrow J/\psi p K^{-}\) and their fractions \(f_{B^0}\) and \(f_{\Lambda _b}\), the probability density functions of time error distributions \(P(\sigma _{t_{i}}|p_{\mathrm {T}_{i}})\), mass error distributions \(P(\sigma _{m_{i}}|p_{\mathrm {T}_{i}})\), \(p_{\mathrm {T}}\) distributions \(P(p_{\mathrm {T}_{i}})\) and tagging parameters and calibrations. These parameter values are mainly fixed in the fit to the values extracted from the \(B_{s}^{0}\) mass signal and sideband regions or from MC simulations.

6 Systematic uncertainties

Systematic uncertainties are evaluated for effects that are described below.

  • Flavour tagging: The effects on the main physics parameters from the fit, due to uncertainties introduced by the flavour tagging procedure, are assessed as follows: The statistical uncertainty due to the size of the sample of \(B^\pm \rightarrow J/\psi K^\pm \) decays is included in the overall statistical uncertainty. The systematic uncertainty arising from the precision of the OST calibration, described in Sect. 4.2, is estimated by changing the models used to parameterise the probability distribution, \(P(B|Q_{x})\), as a function of the cone charge from the function used by default (a third-order polynomial for muons and a sinusoid for electrons) to one of several alternative functions: a linear function; a fifth-order polynomial; or two third-order polynomials that describe the positive and negative regions and have common constant and linear terms, but independent quadratic and cubic terms. The \(B_{s}^{0}\) fit is repeated using the alternative models and the largest deviation from the nominal fit is assigned as the systematic uncertainty. To validate the calibration procedure, calibration curves are derived from simulated samples of \(B^{\pm }\) and \(B_{s}^{0} \) signals. The variations between the curves from these two samples are propagated to the calibration curves derived from data. The differences in the parameter values between the nominal fit and that with the varied calibration curves are included in the systematic uncertainty.

    An additional systematic uncertainty is assigned to account for potential dependencies on the pile-up distribution. The calibration data are split into subsets of approximately equal size, separated according to the estimated pile-up of the event, and separate calibrations are made for each subset. For the \(B_{s}^{0}\) fit, the fit is repeated using the calibrations corresponding to the estimated pile-up of that event. Differences between the nominal and the modified fit for the parameters of interest are taken as the systematic uncertainty. For the terms \(P_{\mathrm {b}}(P(B|Q_{x}))\) and \(P_{\mathrm {s}}(P(B|Q_{x}))\), variations of the parameterisation are considered (including using histograms in place of a parameterisation). The resulting changes in the parameter values of the \(B_{s}^{0} \) fit are similarly included in the systematic uncertainties.

  • ID alignment: The changes of the fit parameters due to residual misalignments of the ID were studied and the observed deviations are included in the systematic uncertainties.

  • Angular acceptance method: The angular acceptance of the detector and the kinematic cuts, \(A(\Omega _i,{ p _{\mathrm {T}_{i}}})\), described in Sect. 5.1, is calculated from a binned fit to MC simulated data. In order to estimate the systematic uncertainty introduced by the choice of binning, different acceptance functions are calculated using different numbers of \(p_{\mathrm{T}}\) bins as well as different widths and central values of the bins.

  • Time efficiency: To correct for the proper decay time dependence of trigger inefficiencies, the events are reweighted according to Eq. (5). To estimate systematic uncertainties connected with this procedure, several alternative fits are performed. Firstly, using different sets of \(p_{\mathrm{T}}\) binning in the MC sample used to determine the efficiency. Secondly, to assess the effects of mis-modelling of the \(B_{s}^{0}\) vertex \(\chi ^2 /\)ndof in simulated data, an alternative fit is done with MC \(\chi ^2 /\)ndof reweighted according to data. Finally, the groups of similar triggers used to determine the efficiencies were subdivided further by their common features. These smaller groups allowed to address even more details of trigger varieties in time-efficiency determination. Deviations from the default fit result are included in the systematic uncertainties of the measurement.

  • Best candidate selection: After applying all selection cuts for \(B_{s}^{0}\) signal events, approximately \(5\%\) of the events are found to contain multiple candidates. In the default fit, the \(B_{s}^{0}\) candidate with the lowest \(\chi ^2 /\)ndof is selected. To assess the systematic uncertainty due to this selection, an equivalent sample is created where all candidates in the event are retained, each weighted by a factor of \(1/N_{\mathrm {cand}}\), where \(N_{\mathrm {cand}}\) is number of \(B_{s}^{0}\) candidates in the event. Deviations from the default fit are included in the systematic uncertainties of the measurement.

  • Background angles model: The shape of the background angular distribution, \(P_{\text {b}}(\theta _T, \varphi _T, \psi _T)\), is described by the fourteenth-order Legendre polynomial functions, given in Eq. (5). Alternatively, higher-order Legendre polynomial functions with \(l_{max}=16\) and \(k_{max}=16\) were tested, and the changes in the fit parameter values relative to the default fit are taken as systematic uncertainties.

    The shapes are primarily determined by detector and kinematic acceptance effects and are sensitive to the \(p_{\mathrm{T}}\) of the \(B_{s}^{0}\) meson candidate. For this reason, the parameterisation using the Legendre polynomial functions is performed in six \(p_{\mathrm{T}}\) intervals: 10–15 \(\text {Ge}\text {V}\), 15–20 \(\text {Ge}\text {V}\), 20–25 \(\text {Ge}\text {V}\), 25–30 \(\text {Ge}\text {V}\), 30–35 \(\text {Ge}\text {V}\) and >35 \(\text {Ge}\text {V}\). The systematic uncertainties due to the choice of \(p_{\mathrm{T}}\) intervals are estimated by repeating the fit, with these intervals enlarged and reduced by 1 \(\text {Ge}\text {V}\) and by 2 \(\text {Ge}\text {V}\). The largest changes in the fit results are taken to represent the systematic uncertainties.

    The sensitivity of the fit to the choice of the invariant mass window is tested by varying its size. The difference to the default fit results is assigned as a systematic uncertainty.

    The parameters of the Legendre polynomial functions given in Eq. (5) are adjusted to give the best fit to the angular distributions for events in the \(B_{s}^{0}\) mass sidebands. To test the sensitivity of the fit results to the choice of sideband regions, the fit is repeated with alternative choices for the excluded signal mass regions: \(|(m(B_s^0) - 5.366 \text {Ge}\text {V}|> 0.085\) \(\text {Ge}\text {V}\) and \(|(m(B_s^0) - 5.366 \text {Ge}\text {V}|> 0.160\) \(\text {Ge}\text {V}\) (instead of the default \(|(m(B_s^0) - 5.366 \text {Ge}\text {V}|> 0.110\) \(\text {Ge}\text {V}\)). The changes in the fit results are assigned as systematic uncertainties.

  • \({\varvec{B}}_{\varvec{d}}\) contribution: The contamination from \(B_d~\rightarrow ~J/\psi ~K^{0*}\) events misreconstructed as \( B_{s}^{0} \rightarrow J/\psi \phi \) is accounted for in the final fit. Studies are performed to evaluate the systematic uncertainties due to trigger effects, the \(B_d \rightarrow J/\psi K^{0*}\) fraction, and the distributions of the mass, transversity angles, and lifetime PDFs. In the MC events the angular distribution of the \(B_d \rightarrow J/\psi K^{0*}\) decay is modelled using parameters taken from Ref. [47]. The contribution of the S-wave \(B_d \rightarrow J/\psi K\pi \) decays as well as its interference with the P-wave \(B_d \rightarrow J/\psi K^{0*}\) decays are also included in the PDF of the fit, following the parameters measured in Ref. [47]. The uncertainties of these parameters are taken into account in the estimation of the systematic uncertainty. After applying the \(B_{s}^{0}\) signal selection cuts, the angular distributions are fitted using Legendre polynomial functions. The uncertainties of this fit are included in the systematic uncertainty.

  • \({\varvec{\Lambda }}_{\varvec{b}}\) contribution: The contamination from \(\Lambda _b \rightarrow J/\psi p K^{-}\) events misreconstructed as \( B_{s}^{0} \rightarrow J/\psi \phi \) is accounted for in the final fit. Studies are performed to evaluate the effect of the uncertainties in the \(\Lambda _b \rightarrow J/\psi p K^{-}\) fraction \(f_{\Lambda _b}\), and the shapes of the distributions of the mass, transversity angles, and lifetime. Additional studies are performed to determine the effect of the uncertainties in the \(\Lambda _b \rightarrow J/\psi \Lambda ^{*}\) branching ratios used to reweight the generated MC sample.

  • Alternate \({\varvec{\Delta }} {\varvec{m}}_{{\varvec{s}}}\): The systematics due to fixing the parameter \(\Delta m_{s}\) to the PDG value were estimated by running an alternative fit where the default model was altered by releasing \(\Delta m_{s}\) within a Gaussian constraint of which the width was taken from uncertainties assigned to \(\Delta m_{s}\) in [33]. The resulting changes to all the fit parameter values are found to be negligible except for \(\delta _{\perp }\). The effects on the other variables are small for the parameters \(\phi _{s} \) and \(\delta _{\parallel }\).

  • Fit model mass and lifetime: To estimate the systematic uncertainties due to the signal \(B_{s}^{0}\) mass model, the default model was altered by adding a second Gaussian function in Eq. (3), which has the same structure as the first Gaussian function but a different scale factor, \(S^1_m\), which is an additional free parameter of the fit. The resulting changes to other fit parameter values are found to be negligible.

    To test the sensitivity of the part of the fit model describing the lifetime, two systematic tests are performed. The determination of signal and background lifetime errors is sensitive to the choice of \(p_{\mathrm{T}}\) bins, in which the relative contributions of these two components are evaluated. To estimate the systematic uncertainty, the fit is repeated varying the intervals of the default \(p_{\mathrm{T}}\) binning. The determination of signal and background lifetime errors is also sensitive to the determination of the signal fraction. The fit is repeated by varying this fraction within one standard deviation of its uncertainty and differences are included in the systematic uncertainty.

  • Fit model \({\varvec{S}}\)-wave phase: As explained in Sect. 5.1, the model for the interference between \(B_s^0 \rightarrow J/\psi \phi (K^+K^-)\) and S-wave \(B_s^0 \rightarrow J/\psi K^+K^-\) is corrected by a factor \(\alpha \) to account for the mass-dependent variations in absolute amplitude and phase of the two amplitudes within the interval 1.0085–1.0305 GeV in \(m(K^+K^-)\). The value of \(\alpha \) is 0.51± 0.08. The central value is obtained under the hypothesis of uniformity of the S-wave amplitude. The uncertainty is systematic and it is due to: detector mass resolution and mass scale uncertainties, uncertainties in the description of the \(\phi \) resonance (mass, width, and shape descriptions as relativistic Breit–Wigner or Flatté parameterisation  [48]), and to uncertainties in the description of the shape and phase variation of the S-wave amplitude. For the last effect, which is dominant, the S-wave amplitude is assumed to be due to the \(f_0\)(980) resonance and it is described as in Ref.  [49], similarly to what was done in Ref.  [16]. The variation from the value of \(\alpha \) obtained with the default assumption is taken as a systematic uncertainty. This procedure uses descriptions of the \(f_0\)(980) based on other measurements [33]. To account for the uncertainty in \(\alpha \), the fit was repeated with \(\alpha = 0.51 + 0.08\) and \(\alpha = 0.51 - 0.08\) values. The variations of the parameter values relative to those from the default fit using the central value of \(\alpha \) are included in the systematic uncertainties.

  • Fit bias: Due to its complexity, the fit model can be sensitive to some nuisance parameters. This limited sensitivity could potentially lead to a bias in the measured physics parameters, even when the model describes the fitted data well. To test the stability of the results obtained from the chosen default fit model, a set of pseudo-experiments is conducted using the default model in both the generation and fit. The systematic uncertainties are determined from the mean of the pull distributions of the pseudo-experiments scaled according to the statistical uncertainty of that parameter in the fit to data. The observed deviations are included in the systematic uncertainties.

The systematic uncertainties are listed in Table 5. For each parameter, the total systematic uncertainty is obtained by adding all of the contributions in quadrature.

7 Results

7.1 Fit results

The results of the likelihood fit are shown in Table 6. The total number of \(B_{s}^{0}\) meson candidates is \( 453\,570 \pm 740 \). The fitted value of the \(B_{s}^{0}\) mass agrees well with the world average value [33]. Fit projections, including ratio plots, are shown in Fig. 7 for the mass and proper decay time and in Fig. 8 for the angles. The ratio plots show the difference between each data point and the total fit line divided by the statistical and systematic uncertainties summed in quadrature (\(\sigma \)) for that point. The deviations of ratio plots are within \(2\sigma \), which shows that the total uncertainties cover any discrepancy between data and fit model.

Table 5 Summary of systematic uncertainties assigned to the physical parameters of interest
Table 6 Fitted values for the physical parameters of interest with their statistical and systematic uncertainties. For variables \(\delta _\perp \) and \(\delta _{\parallel }\) the values are given for the two solutions (a) and (b). The difference in - 2\(\Delta \)ln(L) between solutions (b) and (a) is 0.03. For the rest of the variables the values for the two minima are consistent. The same is true for the statistical and systematic uncertainties of all the variables
Fig. 7
figure 7

(Left) Mass fit projection for the \( B_{s}^{0} \rightarrow J/\psi \phi \) sample. The red line shows the total fit, the short-dashed magenta line shows the \( B_{s}^{0} \rightarrow J/\psi \phi \) signal component, the combinatorial background is shown as a blue dotted line, the orange dash-dotted line shows the \(B_{d}^{0} \rightarrow J/\psi K^{0*}\) component, and the green dash-dot-dot line shows the contribution from \(\Lambda _b \rightarrow J/\psi pK^{-}\) events. (Right) Proper decay time fit projection for the \( B_{s}^{0} \rightarrow J/\psi \phi \) sample. The red line shows the total fit while the short-dashed magenta line shows the total signal. The total background is shown as a blue dotted line, and a long-dashed grey line shows the prompt \(J/\psi \) background component. Below each figure is a ratio plot that shows the difference between each data point and the total fit line divided by the statistical and systematic uncertainties summed in quadrature (\(\sigma \)) of that point

Fig. 8
figure 8

Fit projections for the transversity angles \(\phi _T\) (top left), \(\cos (\theta _T)\) (top right), and \(\cos (\psi _T)\) (bottom). In all three plots the red solid line shows the total fit, the \( B_{s}^{0} \rightarrow J/\psi \phi \) signal component is shown by the magenta dashed line and the blue dotted line shows the contribution of all background components. Below each figure is a ratio plot that shows the difference between each data point and the total fit line divided by the statistical and systematic uncertainties summed in quadrature (\(\sigma \)) of that point

While for most of the physics parameters, including \(\phi _{s} \), \( \Delta \Gamma _{s}\) and \( \Gamma _{s}\), the fit determines a single solution with Gaussian behaviour of the projection of the log-likelihood (see Fig. 10 in Sect. 7.2), for the strong-phases \(\delta _{\parallel }\) and \(\delta _\perp \) two well separated local maxima of the likelihood are found, and shown as solution (a) and (b) in Table 6. The difference in \(-2\Delta \)ln(L) between the two solutions is 0.03. As discussed in Sect. 7.2, the two-fold behaviour of the likelihood in the strong phases is the result of an approximate symmetry of the signal PDF. The effect is completely negligible for all other variables, for which the fit values and uncertainty ranges overlap accurately.

The correlation between statistical uncertainties of the parameters are also shown in Table 7 for solution (a) and in Table 8 for solution (b). The correlations do not change significantly between the physics variables which remain stable between the solutions (a) and (b). The correlations with the two strong phases: \(\delta _{\parallel }\) and \(\delta _\perp \) flip the sign, as expected, except for the correlations that are smaller than  0.01. The correlations between \(\delta _{\parallel }\) and \(\delta _\perp \) themselves keep the same sign in both solutions (a) and (b).

Releasing the direct CP related \(\lambda \) parameter (see Sect. 5.1) in the fit results in a \(\lambda \) value compatible within the statistical precision with unity as well as with results of other experiments. The effect on the values of the other parameters of interest is negligible, due to small correlations with \(\lambda \).

7.2 Fit to strong phases

As shown in Table 6, the likelihood fit has determined two solutions with well separated values for the strong phases \(\delta _{||}\) and \(\delta _{\perp }\). Figure 9 shows results of the 2D log-likelihood scan in the \(\delta _{||}\), \(\delta _{\perp }\) plane, revealing two minima, identified at (\(\delta _{||} = 3.35\), \(\delta _{\perp }=3.12\)) and (\(\delta _{||} = 2.94\), \(\delta _{\perp }=2.91\)). These minima are represented by two-dimensional contours at the level of \(-~2 \Delta { \ln (L)} = 2.30, 6.18,\) and 11.83, where \(-~2 \Delta { \ln (L)} = 2 {(\ln (L^i) - \ln (L^{a}))} \) is the difference between the likelihood values \( {(L^i)}\) of the fit in which the two strong phases are fixed to the values shown on the horizontal and vertical axis, and \( L^{a}\) which is the likelihood value for solution (a) of the fit.

An approximate symmetry in the signal PDF is at the origin of this duality. The strong phases are determined by the six interference terms 4) − 6) and 8) − 10) in Table  4. Four of them, namely terms 4), 5), 8) and 9), are invariant under the transformation

$$\begin{aligned} \{ \delta _{\parallel }, \delta _{\perp }, \delta _{S} \} \rightarrow \{ 2 \pi -\delta _{\parallel }, \delta _{\perp } + 2 (\pi -\delta _{\parallel }), \delta _{S} + 2 (\pi - \delta _{\parallel }) \}. \end{aligned}$$
(6)

This transformation is proportional to \(\pi -\delta _{\parallel }\), which from our data is equal to 0.21. The two local maxima of the likelihood determined in this analysis satisfy the relation of Eq. (6) very accurately for \(\delta _{||}\) and \((\delta _{\perp }\) - \(\delta _{S})\), and within the statistical uncertainty in the transformation for \(\delta _{\perp }\). The difference in the log-likelihoods, \(-~2 \Delta { \ln (L)}\), between the two solutions is equal to 0.03, favouring (a) but without ruling out (b).

As discussed in Sect. 7.1, the two-fold nature of the likelihood maxima for the strong phases has a negligible effect on all the other variables. Figure 10 shows the one-dimensional likelihood scans for all other physics parameters, separately for the two solutions (a) and (b). For each scan, the other physics parameters and all nuisance parameters are optimised in a profile-likelihood fit. The 1D scans are almost identical for the two solutions.

8 Combination with 7 \(\text {Te}\text {V}\) and 8 \(\text {Te}\text {V}\) results

The measured values of solution (a) and solution (b) are consistent with those obtained in the previous analysis [13] using \( 19.2\, \mathrm {fb^{-1}} \) of data collected at \(\sqrt{s}\) = 7 \(\text {Te}\text {V}\) and 8 \(\text {Te}\text {V}\). A best linear unbiased estimator (BLUE) [50, 51] is used to combine the current results with those from the previous analysis. The measured values, uncertainties, and correlations are taken from the measurements performed at each centre-of-mass energy. The statistical correlation between these three measurements is zero as the events are different. The correlations of the systematic uncertainties between the three measurements are estimated and tested in several categories depending on whether the given systematic effect changed significantly between the measurements. Solution (a) and solution (b) are treated separately, leading to the two sets of combined results as shown in Table 9. The correlation matrices of these two combinations are shown in Tables 10 and 11.

Fig. 9
figure 9

Two-dimensional constraints on the values of \(\delta _{||}\) and \(\delta _{\perp }\) for solutions (a) and (b) at the level of \(-~2 \Delta { \ln (L)} = 2.30, 6.18,\) and 11.83 respectively created using a full 2D scan. The minimum of the solution (b) is \(-~2 \Delta { \ln (L)} = 0.03\) higher than the minimum of the solution (a)

Table 7 Fit correlations between the physical parameters of interest, obtained from the fit for solution (a)
Table 8 Fit correlations between the physical parameters of interest, obtained from the fit for solution (b)

The two-dimensional likelihood contours in the \(\phi _{s} \)\( \Delta \Gamma _{s}\) plane for the ATLAS result based on 7 and 8 \(\text {Te}\text {V}\) data, the solution (a) of the 13 \(\text {Te}\text {V}\) measurement, and the combined result for the solution (a) are shown in Fig. 11. The statistical and systematic uncertainties are combined in quadrature and correlations are taken into account in the construction of Gaussian contours. Because there is no significant difference in the \(\phi _{s} \) and \( \Delta \Gamma _{s}\) values between solution (a) and solution (b), only contours for solution (a) are shown.

Fig. 10
figure 10

1D log-likelihood scans of all other variables of the fit for the solution (a) (blue) and the solution (b) (dashed red). The variable on the vertical axis, \(-2 \Delta { \ln (L)} = 2 { (\ln (L^{a}) - \ln (L^i))} \), is the difference between the likelihood values of the default fit, \( {(L^a)}\), and of the fit in which the physical parameter is fixed to values shown on the horizontal axis

Two-dimensional likelihood contours in the \(\phi _{s} \)\( \Delta \Gamma _{s}\) plane are shown in Fig. 12 for this ATLAS result, the result from CMS [17] using the \( B_{s}^{0} \rightarrow J/\psi \phi \) decay, the result from LHCb [16] using the \(B_{s}^{0} \rightarrow J/\psi K^+K^-\) decay and finally the LHCb result including all \(B_{s}^{0}\) channels [16, 18,19,20,21]. The contours are obtained by interpreting each result as a two-dimensional Gaussian probability distribution in the \(\phi _{s} \)\( \Delta \Gamma _{s}\) plane. All results are consistent with each other and with the SM [5, 52].

9 Summary

This paper presents a measurement of the time-dependent CP asymmetry parameters in \( B_{s}^{0} \rightarrow J/\psi \phi \) decays from an \( 80.5\, \mathrm {fb^{-1}} \) data sample of pp collisions collected with the ATLAS detector during the 13 \(\text {Te}\text {V}\) LHC run. The values from the 13 \(\text {Te}\text {V}\) analysis are consistent with those obtained in the previous ATLAS analysis using 7 \(\text {Te}\text {V}\) and 8 \(\text {Te}\text {V}\) data. The two measurements are statistically combined.

Table 9 Values of the physical parameters extracted in the combination of solution (a) and solution (b) of 13 \(\text {Te}\text {V}\) results with those obtained from 7 and 8 \(\text {Te}\text {V}\) data
Table 10 Correlation matrix of the BLUE combination of the 7 and 8 \(\text {Te}\text {V}\) results and the solution (a) of the 13 \(\text {Te}\text {V}\) result
Table 11 Correlation matrix of the BLUE combination of the 7 and 8 \(\text {Te}\text {V}\) results and the solution (b) of the 13 \(\text {Te}\text {V}\) result

The CP-violating phase \(\phi _{s} \) is measured to be \(-0.087\) ± 0.036 (stat.) ± 0.021 (syst.) rad, the decay width difference between heavy and light \(B_s^0\) mass eigenstates, \( \Delta \Gamma _{s}\)= 0.0657 ± 0.0043 (stat.) ± 0.0037 (syst.) \(\mathrm {ps}^{-1}\) and the measurement for their average decay width, \( \Gamma _{s}\)= 0.6703 ± 0.0014 (stat.) ± 0.0018 (syst.) \(\mathrm {ps}^{-1}\). The measurement of the CP-violating phase \(\phi _{s} \) is consistent with the Standard Model prediction, and it improves on the precision of previous ATLAS measurements, while for the average decay width, \( \Gamma _{s}\), the comparison of the 13 \(\text {Te}\text {V}\) analysis with the current world combined value reveals a tension at the level of \(3\sigma \).

Fig. 11
figure 11

Contours of 68% confidence level in the \(\phi _{s} \)\( \Delta \Gamma _{s}\) plane, showing ATLAS results for 7 and 8 \(\text {Te}\text {V}\) data (blue dashed-dotted curve), for 13 \(\text {Te}\text {V}\) data (green dashed curve) and for 13 \(\text {Te}\text {V}\) data combined with 7 and 8 \(\text {Te}\text {V}\) (red solid curve) data. The Standard Model prediction [5, 52] is shown as a very thin black rectangle. In all contours the statistical and systematic uncertainties are combined in quadrature and correlations are taken into account

Fig. 12
figure 12

Contours of 68% confidence level in the \(\phi _{s} \)\( \Delta \Gamma _{s}\) plane, including results from CMS (orange) and LHCb (green) using the \(B_{s}^{0} \rightarrow J/\psi K^+K^-\) decay only and LHCb (red) for all the channels. The blue contour shows the ATLAS result for 13 \(\text {Te}\text {V}\) combined with 7 and 8 \(\text {Te}\text {V}\). The Standard Model prediction [5, 52] is shown as a very thin black rectangle. In all contours the statistical and systematic uncertainties are combined in quadrature